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Abstract 

In this paper, a fast atmospheric and surface temperature retrieval a'gorithm is developed for the 
high resolution Infrared Atmospheric Sounding Interferometer (IASI » space-borne instrument. 
This algorithm is constructed on the basis of a neural network techn que that has been 
regularized by introduction of a priori information. The performance of the resulting fast and 
accurate inverse radiative transfer model is presented for a large diversified dataset of radiosonde 
atmospheres including rare events. Two configurations are considered: a tropical-airmass 
specialized scheme and an all-air-masses scheme. 
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1. Introduction 

The Infrared Atmospheric Sounding Interferome- 
ter (IASI) is a high resolution (0.25 cm -1 ) Fourier 
transform spectrometer scheduled for flight in 2005 
on the European polar METeorologica! Operational 
Platform (METEOP-1) satellite funded by the Eu- 
ropean organization for METeorologica! SATellites 
(EUMETSAT) and the European Space Agency (ESA) 
member states. This instrument is intended to re- 
place the High Resolution Infrared Radiation Sounder 
(HIRS) as the operational infrared sounder and is ex- 
pected to reach accuracies of 1 K in temperature and 
10 % in water vapor with vertical resolutions of 1 km 
and 2 kms respectively. IASI, jorntly developed by 
the Centre National d ’Etudes Spr dales (CNES) and 
EUMETSAT, provides spectral co eragefrom 3.5 pm 
and 15.5 pm at considerably higher spectral resolu- 
tion than HIRS and, together with the Advanced Mi- 
crowave Sounding Unit (AMSU), s expected to lead 
to dramatic improvements in the accuracy and height 
resolution of remotely sensed temperature and humid- 
ity profiles and ozone amount. 

The goal of this study is to present an inversion al- 
gorithm that retrieves geophysical variables from IASI 
measurements. We are confronted, in this work, with 
problems related to the ill-posed character of the in- 
verse problem, the sensitivity to noise and, specific 
to IASI, the data dimension. The Multi-Layer Per- 
ceptron (MLP) technique is particularly interesting 
to solve this kind of problems. Such an approach has 
already been developed by the Atmospheric Radia- 
tion Analysis (ARA) group of LMD for HIRS cou- 
pled with the Microwave Sounding Unit (MSU) [£s- 
cobar et al., 1993], for the Special Sensor Microwave / 
Temperature (SSM/T) instrument on board the De- 
fense Meteorological Satellite Program (DMSP) [Rieu 
et al., 1996], and even for the high resolution infrared 
spectrometer Advanced Infrared Radiation Sounder 
(AIRS) of National Aeronautics and Space Admin- 
istration (NASA) for the coming Earth Observation 
System (EOS-PM-1) [Escobar et al., 1993] or for IASI 
instrument [Aires et al., 1998. The great advantages 
of MLP are the rapidity, the small amount of mem- 
ory required and accuracy of results [A ires, 1999]. The 
MLP model is nonlinear, which is a crucial point for 
the regression fit to the inverse Radiative Transfer 
Equation (RTE). Furthermore, assumptions like the 
linearity of the RTE or the Gaussian assumptions for 
stochastic variables, are not required for the MLP. 

In this paper, it is demonstrated that the inver- 


sion procedure can be regularized by introducing var- 
ious kind of a priori information about the physical 
problem to the neural method. This may be done 
within the three components of the neural network 
technique: the architecture of the network, the learn- 
ing algorithm and the learning data base. This ap- 
proach overcomes the “black-box” modeling concep- 
tion often associated with Neural Network methods. 

We present here an application to the problem of 
surface temperature and the atmospheric tempera- 
ture profile retrievals with the IASI instrument. Pre- 
vious studies have used information content analysis 
to estimate the expected retrieval errors of IASI [Am- 
ato and Serio, 1997; Prunet et al., 1998]; but this 
kind of estimate is dependent on some assumptions 
(Gaussian hypothesis, independence of first-guess and 
observation, first-guess error covariance matrices of- 
ten taken to be diagonal, i.e. no correlations among 
the first-guess errors of the variables, etc), and in the 
limited number of atmospheric situations that have 
been examined. 

Our neural network model is learned and tested 
through a large number, 3500, of real atmospheric 
situations as measured by radiosondes, taken from 
the Thermodynamic Initial Guess Retrieval (TIGR) 
data base [Chedin et al, 1985; Achard, 1991; Escobar, 
1993b; Chevallier et al., 1998; 2000], These atmo- 
spheric situations include very complex temperature 
profiles that are often much more irregular than re- 
analysis data or model ouput data. Rare situations 
are also included so that the dataset represents, as 
much as possible, all kinds of possible atmospheric 
situations (initially for a pattern recognition pur- 
pose). This complexity represents a higher variability 
than that encountered in operational conditions with 
model output data, so our estimation of the retrieval 
errors could be an over-estimate. However the use of 
a large and complex climatological dataset allows the 
inversion model to be calibrated globally and even for 
rare events. Furthermore, our analysis of the retrieval 
error is made for realistic instrumental noise condi- 
tions. Contrary to other approaches, no assumptions 
about the physical problem are used, like the linear 
or the Gaussian assumptions. 

This paper is organized as follows. The physical 
problem associated with our application is presented 
in section 2. The neural network approach is discribed 
in section 3. The data bases used in this study are 
presented in section 4. Two applications of our neu- 
ral technique are then presented: the surface tempera- 
ture retrieval (section 5) and the atmospheric temper- 



3 


ature profile retrieval (section 6). Short conclusions 
and perspectives are given in section 7. 

2. Sounding the Atmosphere with the 
IASI Instrument 

2.1. Radiative Transfer in the Atmosphere 

The radiance measured by an instrument at the 
top of the atmosphere depends on the atmospheric 
and surface properties. This dependence is described 
by the Radiative Transfer Equation (RTE): 

/( i/) = e a B{T s ,u)r sl , + ' B(T(P), 

where v is the wavenumber (cm -1 ), e s the Earth’s 
surface emissivity which may be a function of wavenum- 
ber, B{T(P) y v) the Planck function which indicates 
the radiance emitted by a black-body at temperature 
T and atmospheric pressure P, r u the transmission 
factor between the satellite and the pressure level P. 

( • ) is often referred to as the weighting function. 
The RTE expresses the two radiative contributions at 
the top of the atmosphere: one arising from the sur- 
face (first term in right hand side) and one from the 
atmosphere (second term in right hand side). The 
equation’s complexity lies in the transmission factors 
which depend on pressure, temperature, concentra- 
tion of gases, spectroscopic characteristics of the ab- 
sorbing gases (C0 2 , O, 03 , *••)• 

To retrieve atmospheric variables from radiative 
measurements at the top of the atmosphere, the in- 
verse of equation (1) has to be computed. The ana- 
lytical inversion of this function is not possible, onlj 
an inference approach can be used [Twomey, 1977]. 
Contrary to the direct problem, which can advanta- 
geously be estimated with high precision by a physi- 
cal algorithm, the inverse problem needs a method of 
resolution based on a statistical representation of the 
(unknown) inverse equation. Two general approaches 
exist: use an inversion scheme for each observation 
(we call this approach the local inversion) or model 
the inverse RTE once and for all (we call this approach 
the global inversion). The local inversion requires gen- 
erally a good initial guess solution and a rapid and 
accurate direct transfer model [ Rodgers , 1976]. Even 
if global inversion models can use a first-guess [Aires 
et al , 2000], this is not required and no direct model 
is used during operational use. Although global inver- 
sion does not have these two limitations, it is a more 
ambitious problem. 


2.2. Instrumental Characteristics 

The two major advances of the IASI instrument 
are: 

• The dramatically increased number of spectral 
channels: for each field of view, 8461 mea- 
sures are available (covering the spectral range 
from 645 to 2760 cm -1 with a resolution (un- 
apodized) of 0.25 cm - 1 ) , with hundreds of them 
sounding the atmospheric temperature. The 
retrieval becomes an over-constrained problem 
(more observations than degree of freedom). 

• The increased resolution power: with IASI the 
resolution power is about A/ dX 1200. Presently, 
the resolution power of the TOVS (TIROS 3 -N 
Observational Vertical Sounding) radiometer is 
between 50 to 100. 

It is expected that the vertical resolution and the 
accuracy of retrievals will substantially increase, the 
IASI mission requirements are an error of 1 K in at- 
mospheric temperature and 10 % in relative humidity 
profiles with respectively a 1 Km and 2 Km vertical 
resolution. 

The IASI noise is simulated [ Cayla et al , 1995] by a 
white Gaussian noise (this is a realistic assumption for 
interferometers) with a Noise Equivalent {NEAT) at 
280 K (Table 1). The NEAT at 280 K represents the 
standard deviation st 2 go(^) of the Gaussian noise for a 
given wave number v. At a different scene brightness 
temperature T\ the standard deviation str • M of the 
Gaussian noise is computed by: 

dB(Tb= 280 , u) 

= d B(TbtT~ ' st2SQ ^ ^ 

orb 

which shows that the noise level increases as T de- 
creases. Figure 1 illustrates the standard-deviation of 
noise at different T It is expected that these car- 
acteristics are an over-estimation of the actual noise 
level for the intrument. Figure 2 shows the IASI spec- 
trum averaged over the TIGR dataset with the cor- 
responding noise standard deviation spectrum. Note 
that some spectral regions could have a noise stan- 
dard deviation larger than 2 K on average. 

There are 4 field-of-view for each IASI samples, 
covering an area of 12 to 9 kms at nadir. Assum- 
ing homogeneous meteorological conditions, an aver- 
age of the 4 pixel measures can be used to perform 
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the retrievals: these 4 field-of-view provide redundant 
measurements that can be averaged to reduce noise. 

3. The Neural Network Inversion 
Approach 

Various neural inversion techniques have been de- 
velopped, like the “Iterative Inversion” [Kindermann 
and Linden , 1990], the “Distal Learning” [Jordan and 
Rumelhart, 1992] or the “Distal Learning” optimized 
by Monte-Carlo algorithm [ Hidalgo and Gomez- Tre nno , 
1996]. We have chosen to use the “Direct Inversion” 
approach for two reasons: it performs a global inver- 
sion and it is possible to introduce a priori informa- 
tion into the method. The a priori knowledge is any 
information about the solution of the problem which 
is in addition to the information contained in the data 
set. In usual statistical techniques (like regression), 
overcoming the “black-box” modeling conception (no 
assumptions about the physical problem) improves re- 
sults. Therefore, we have combined three approaches: 
the structural stabilization of the network, regulariza- 
tion of the learning algorithm by the input pertur- 
bation technique and a physically optimized feature 
selection process in the IASI data. Our numerical ex- 
periments have shown that the introduction of this 
kind of a priori information is very useful and makes 
training possible with relatively few data. 

3.1. Global Inversion 

In the “Direct Inversion” technique, a MLP neural 
network is used to estimate directly the mapping be- 
tween the IASI observations and retrieved geophysical 
variables. In effect, the “trained” MLP is a statistical 
model of the inverse RTE, providing once and for all 
a global inversion. The learning algorithm (the more 
expensive computational part) is performed off-line 
only once. Then, the application of the neural net- 
work model for the inversion of IASI observations is 
quasi-immediate in the operational stage: no regres- 
sions and no Jacobian computations are required. 

Another advantage over classical physico/statistical 
techniques is that a good initial condition for the in- 
version is not needed. Moreover the required mem- 
ory storage is very small. There’s also no need for 
a rapid direct model (necessary in iterative inversion 
algorithms), where the speed is usually obtained by 
linearizing the RTE and assuming uncorrelated Gaus- 
sian errors. 


3.2. MLP and Structural Stabilization of the 
Architecture 

The MLP network is a mapping model composed 
of parallel processors called “neurons”. These pro- 
cessors are organized in distinct layers: the first layer 
(number 0) represents the input X = 0 < i < mo) 

of the mapping where m 0 is the number of neurons 
in layer 0. The last layer (number L) represents the 
output of the mapping Y = (y*;0 < k < m L ). The 
intermediate layers (0 < m < L) are called the “hid- 
den layers”. These layers are connected via neuronal 
links (Figure 3): two neurons i and j between two 
consecutive layers have synaptic connections associ- 
ated with a synaptic weight . A neuron executes 
two simple operations: first, it makes a weighted sum 
of the inputs and then transfers this signal to its out- 
put through a so called transfer, sigmoide of activa- 
tion function like a (a) = tanh(a). The neuron j of 
a hidden-layer of the output-layer has an output z 3 

given by. z j = cr | t Jij * Zi J . Generally, for 

Wlnputso) / 

regression problems, the output units have a transfer 
function that is identity. For example, in a one hid- 
den layer MLP, the output Xk of the network is 
defined as: 



where o is the sigmoide function, aj is the activity 
of neuron j and S { is the i th layer of the network 
(with i = 0 for the input layer). We have deliber- 
ately omitted the usual bias term in this formula to 
simplify notation. It has been demonstrated [Homik 
et a/., 1989; Cybenko et a/., 1989] that any continu- 
ous function can be represented by a one-hidden layer 
MLP. 

The neuron acts, in its entire input space, as a 
“fuzzy” linear discriminant: a neuron j cuts its in- 
put space into two half subspaces separated by a 
plane orthogonal to the vector of its input weights 
€ Inputs (j)}. On one side of the “frontier” the 
response of the neuron is 0, on the other side the re- 
sponse is 1 and in the “fuzzy frontier” the response of 
the neuron is quasi-linear (corresponding to the linear 
part of the transfer function). So, the MLP network, 
like linear regression, is very well adapted to high- 
dimension data because its neurons acts in the entire 
data space and not in a partition of this space like 
some methods (radial basis function, splines interpo- 
lators, etc). 
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How is the neural network structure defined ? The- 
oretically, it has been demonstrated [Sontag, 1991] 
that any inverse problem can be resolved by a two- 
hidden layer MLP network since such neural networks 
can take into account discontinuities and extremely 
nonlinear variations (often present in inverse prob- 
lems), in contrast to one-hidden layer MLPs that ap- 
proximate continuous functions. 

In practice, the answer can be different: we have 
observed in our experiments that with noise-corrupted 
data, a one-hidden layer can be sufficiem . Further- 
more, our experiments show that smoot. solutions 
shall be obtained by a one hidden layer. This limi- 
tation in the number of hidden layers is a structural 
stabilization: the resulting reduction of the number of 
free parameters (the synaptic weights W) regularizes 
the neural estimation, producing a functional equiva- 
lence between the desired function (the inverse of the 
RTE) and its estimation (the trained neural network). 

3.3. Learning Algorithm and Regularization 
by Input Perturbation 


It is an heuristic method to control the effective com- 
plexity of the neural network mapping. The technique 
consists, during the learning step, in adding to each 
input a random vector representing the instrumen- 
tal noise. It has been demonstrated [Bishop, 1996] 
that, under certain conditions (low noise assumption), 
training with noise is closely related to regularization 
(or smoothing) technique. In the Input Perturbation 
method, the usual error function C(W) (equation (4)) 
takes the form: 


C(W) = \ tin (y k (x + 77 ; W) - t k fP{t k /x)P(x)P(v)dt, 


If the noise r) is sufficiently small, we can expand the 
network function y k {x + r/; w) to first order. Then, we 
obtain the relation: 


C(W) ~ C(W) + v • fW) (7) 


where v is the noise variance and 

mo mi - / r\ \ 2 




( 8 ) 


Given an architecture (number of layers, neurons 
and connections) , all the information of the network 
is contained in the weights W (the set of all synaptic 
weights uiij). The learning algorithm is the optimiza- 
tion technique that estimates the optimal network 
parameters W = {u^} by minimizing a loss func- 
tion C(W) so that the neural mapping approaches as 
closely as possible the desired function. The most fre- 
qently used criterion to adjust W is the mean square 
error in network outputs: 

C(W) = i 2 / / (yk(x; W) ~ i P(tk/x)P(x)dt k dx 

2 k=\ J J 

with t k the k th desired output component, y k thek lh 
neural output component and P(-) the probability 
density function of input data x. Practically, C{W ) 
is approximated by: 

C(W) = — (y fc (x; re) - t k f (5) 

e=l 

The Error Back-Propagation (BP) Algorithm [ Rumel - 
hart et al. , 1986] is used to minimize C{W). It is a 
stochastic steepest descent method very well adapted 
to this neural architecture because the computational 
cost is linearly related to the number of parameters. 

To reduce the estimation sensitivity to input noise 
in the data, we use the Input Perturbation technique. 


is a Tikhonov penality terms (i.e. stabilizator) which 
avoids solutions with high gradients (rapid varia- 
tions of the neural function). So the minimization of 
this new criterion C(W) constrains the solutions to 
be smooth. This regularization technique limits the 
number of degrees of freedom in the neural network 
to bring its complexity nearer to the desired function. 
This limitation reduces the class of possible solutions 
and makes the solution of the problem unique. 

3.4. Feature Selection for Dimension 
(4) Reduction 

A MLP neural network can, in principle, be used 
to map any input vector space to any output vector 
space; however, in practice, the data representation 
significantly affects the quality of the final results. In 
particular, care must be exercised to avoid an over 
emphasis on the noise component. Dimension reduc- 
tion techniques can be used to present not only a more 
compact representation but also more pertinent infor- 
mation to the input of the neural network. 

The curse of dimensionality stipulates that it is 
hard to apply a statistical technique to high dimen- 
sion space data. We have seen in section 3.2 that 
the MLP is a well-adapted technique in this kind of 
problem, but practical problems still occur for high 
dimensional data: for example, the number of pa- 
rameters (the weights W in the MLP neural network) 
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increases with the number of inputs. This can allow 
excess of degrees of freedom in the neural interpolator 
which, when combined with the introduction of non- 
informative data (i.e. noise or spectral information 
non related to retrieved quantities), may distort the 
learning process: the quality criterion is more difficult 
to minimize and the computations are longer. 

Thus, the goal of dimension reduction is to present 
to the neural network the most relevant information 
from initial rough data (i.e. noisy physical measure- 
ments). There exist two ways to reduce the dimension 
of the input data [ Jain and Zongker, 1997]: feature 
extraction (a transformation of rough data by an op- 
erator, linear or not) and feature selection (selection of 
channels in input data) [Bishop, 1996]. Feature selec- 
tion is chosen here: for the retrieval of one geophysical 
variable, we select channels that are, as far as possi- 
ble, uniquely sensitive to this one atmospheric param- 
eter. By studying the RTE Jacobians (derivatives of 
the transmittances with respect to each geophysical 
parameter), it is possible to analyze mutual informa- 
tion between measured brightness temperatures and 
geophysical variables [Cherny et al., 1993]. But we 
need to make a compromise between reducing data 
dimension and preserving the redundant information 
in rough data to alleviate effects of noise. 

4. Radiosonde-Based Learning and 
Test Datasets 

4.1. Construction of an IASI Learning Data 
Set: the TIGR Data Base 

We use in our application the three TIGR (Ther- 
modynamical Initial Guess Retrieval) data bases of 
the ARA group: TIGR1 (861 atmospheres) [Chedin et 
al., 1985], its 1990 revised version TIGR2 (1761 atmo- 
spheres: 322 in tropical air-mass, 388 in mid-latitude 
type 1, 354 in mid-latitude type 2, 104 in polar type 
1 and 593 in polar type 2) [Achard, 1991; Escobar et 
al., 1993] and its 1997 extended version TIGR3 (2311 
atmospheres: same as TIGR2 but with an extended 
tropical air-mass of 872 atmospheres) [Chevallier et 
al., 1998]. All of these datasets are constituted from 
more than 150,000 radiosonde measurements, sam- 
pled for their diversity, and described by their tem- 
perature and gas concentration profiles with a dis- 
cretization of atmosphere into 40 layers (see Table 2). 
This sampling includes a large number of rare events. 
The final data base is composed of 3494 complex at- 
mospheres. The minimum and maximum envelopes 
of the TIGR3 atmospheric temperature profiles are 


represented in Figure 4 to illustrate the large range 
of variability that the radiosondes measurement rep- 
resent. Not only the range of variability can be ex- 
treme, but also inversion in the profiles can product 
complicated structures that are very challenging to 
any retrieval method. 

The 4 A (Automatized Atmospheric Absorption At- 
las) line-by-line forward radiative transfer algorithm 
[Scott and Chedin, 1981; Toumier, 1995] has been 
used to compute the IASI brightness temperatures 
associated with these 3494 atmospheres for clear con- 
dition over the sea. The 4A algorithm allows for 
an analytical computation of the physical Jacobians 
(first derivatives of the transmittance with respect to 
each variable like temperature, gas concentration, etc) 
[Cherny et al., 1995], An illustration of such Jaco- 
bians versus pressure is given in Figure 5 for the spec- 
tral region 650 - 800 cm" 1 (15.5 pm - 12.5 pm). 
The vertical integration of the atmospheric informa- 
tion is illustrated in Figure 6 where Jacobians for 6 
wave numbers in the 15.5 pm - 12.5 pm spectral 
region are shown. Channels with a limited extent 
(mostly in the lower atmosphere), in terms of vertical 
resolution, provide a more precise information than 
the others (in the top of the atmosphere) because a 
flat Jacobian indicates ambiguities in the retrieved 
profile. The spacing of the peaks is also important 
to reduce ambiguities. The concept of vertical reso- 
lution depends on both the width and the spacing of 
the channel’s jacobians [ Rodgers , 1990], 

4.2. Improved Representation of the surface 
temperature in TIGR 

In the current TIGR data base, the surface temper- 
ature Ts has been set equal to the temperature of the 
40 tft (lowest) atmospheric level T40. Ths does not 
represent the actual situation, especially over land, 
where the surface skin temperature can differ signif- 
icantly from the near-surface air temperature in sys- 
tematical ways with time-of-day, latitude, season and 
location (see for example [Rossotc et al., 1989]). For 
better representative we statistically generate, for 
each atmosphere, a set of 10 different Ts using the 
T40 information, based on the statistical distribution 
(i.e. mean and standard deviation) of (T40-Ts) in a 
data base of 150,000 radiosonde measurements. Thus, 
for every atmosphere, knowing T40, we choose ran- 
domly 10 Ts with the estimated density probability. 

For example, in the tropical air-mass, we obtain a Ts 
data base of 3220 atmospheres (322 x 10). 



7 


5. Surface Temperature Retrieval 

This study is limited to clear sky oceanic situa- 
tions and to the tropical air-mass case, emissivity is 
set equal to 1.0. Ts in the tropical air-mass is very 
important to climatological analyses. 

5.1. Jacobian-Based Channel Selection 

There are two spectral regions sensitive to the 
surface characteristics in the IASI spectral domain: 
12.5 fim — 10.2 fjLm (— 800 - 980 cm : ) and 

4.0 (im - 3.6 fim (~ 2500 - 2750 cm' 1 ). It is 
worth noting that the second spectral region can be 
contaminated by the sun during the day. However, in 
these regions, some wavelengths are contaminated by 
other atmospheric constituents. To eliminate the cor- 
rupted channels and to reduce the dimensionality (as 
explained in section 3.4), we use a channel selection 
process based upon an analysis of the wavelength sen- 
sitivity to Ts variations. We define sensitivity as the 
mean variation /( v) for 1 K change of Ts (see equa- 
tion (1)). We select, in these two windows, all chan- 
nels with a sensitivity higher than a fixed threshold 
(Figure 7). 357 channels are obtained in the first win- 
dow (with a threshold of 70 % which realises a good 
compromise) and 262 in the second window (with a 
threshold of 85 % because channels are more sensitive 
to surface temperature in this wdndow). 

5.2. Network Learning and Testing 

The TIGR data base (section 4.2) is divided into 
a learning base of 3000 atmospheres to make the re- 
gression and a base of 220 atmospheres to test the 
generalization ability of the trained neural mapping. 

To retrieve the Ts variable, we use a one-hidden- 
layer MLP neural network (see section 3.2 for struc- 
tural stabilization). For the first window (800-980 
cm -1 ) the neural structure is 357-20-1: 357 neurons 
in the input layer (357 selected brightness tempera- 
tures), 20 neurons in the hidden layer and 1 neuron 
in the output layer (representing Ts). For the second 
window (2500-2750 cm' 1 ), the structure is 262-20-1. 

This neural mapping is then trained by the Er- 
ror Back-Propagation algorithm on the learning base. 
The Input Perturbation regularization technique is 
used: simulated noise (according to the NEAT speci- 
fications) is added to the input data during the learn- 
ing step. The generalization ability of our model was 
then tested on noisy data computed on the 220 test 
atmospheres. The instantaneous retrieval of Ts from 
noisy data gives a generalization RMS of approxi- 


mately 0.4 K. Similar results are obtained using only 
the second spectral window. Without noise, the RMS 
error is less than 0.3 K, i.e., the retrieval error is signif- 
icantly affected by measurement error, not the error 
of the neural regression fit. 

6. Atmospheric Temperature Profile 
Retrieval 

The 40 layers of 4 A (see Table 2) were used to com- 
pute the brightness temperature spectrum for clear 
condition over ocean, but for the retrieval, the verti- 
cal discretization of the atmosphere has been changed 
(from 4A-levels to lKm-levels) to match IASI spec- 
ifications. The objective of this section is then to 
retrieve the 32 lower atmospheric temperature of the 
lKm-layer profiles. 

6.1. Channel Selection 

The choice of the channels for the retrieval of tem- 
perature profiles is made so that they are, as much as 
possible, sensitive to only one constant-concentration 
gas; then, variations of I{v) in equation (1) result 
mainly from temperature variations. Thus, the “CO 2 , 
NO 2 (or both) absorbing-spectral regions” are used 
for the retrieval of atmospheric temperature profiles: 
the 15.5 fxm — 12.5 fim (~ 645 — 800 cm 1 ) and the 
4.7 fj,m - 4.0 fim (~ 2100 - 2500 cm" 1 ) spectral 

regions. 

To present the most relevant information to the 
neural network inputs (section 3.4), we use a chan- 
nel selection process. The feature selection method is 
based on the study of the Jacobians in order to define 
the sensitivity of a channel to atmospheric tempera- 
ture. The mean Jacobian in TIGR3 indicates the sen- 
sitivity relation between atmospheric layers and chan- 
nels. The standard deviation of the Jacobian (around 
the mean) is negligible except near the surface; this 
means that the mean Jacobian is robust to the at- 
mospheric situation except in the lower atmospheric 
layers. 

The feature selection process has two steps. First, 
channels are selected which satisfy quality criteria, 
i.e. specifying, as unambiguously as possible: (1) the 
Jacobian extent of a channel (characterized by the 
area below the Jacobian) and the Jacobian width at 
mid-height have to be smaller than fixed thresholds, 
(2) the Jacobian center of a channel is not near sur- 
face; and (3) the Jacobian has a single peak. For the 
15.5 fim - 12.5 iim spectral region, we have selected 
442 channels within the 621 channels of the spectral 
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range (645 - 800 cm~ l with 0.25 cm~ l resolution). 

The second step choses a vertically uniform subset 
of the channels that meet the quality criteria. The 
IASI instrument gives little information below 10 hPa, 
so our retrievals will be limited to the pressure range 
1013-10 hPa (32 layers with discretization of 1 Km). 
We have chosen 9 channels for each of 30 layers (the 
previous 32 layers minus the two lowest layers sen- 
sitive to surface temperature) between 1013 and 10 
hPa. The final number of channel is 270. However, 
it is important to note that the layers 23-28 have a 
deficit in channels and that the sensitivity is higher 
in the lower atmosphere (Figure 8). 

The 4.7 fim - 4.0 fim spectral region is also impor- 
tant for the at nospheric temperature profile retrieval 
for two reasoi s. First, the lower atmospheric Jaco- 
bians are narr >wer than in the 15.5 pm - 12.5 pm 
region allowing for a better vertical resolution. Sec- 
ond, the charn els are less affected by water vapor. 

However, d le to the larger noise in this spectral 
domain, the eh-innel selection has tc be performed dif- 
ferently than in the 15.5 pm — 12.5 pm region. The 
IASI noise (see section 2.2) - the standard deviation of 
the Gaussian noise - may be as large as a few degrees 
for channels sensing the higher layers (low brightness 
temperatures). The redundancy of the information 
due to the number of channels doesn’t compensate 
this noise. Consequently, the spectral range used cov- 
ers mainly the lower atmospheric layers. The Jaco- 
bian analysis selects channels in the 2140-2240 cm -1 
spectral range (401 channels). 

6.2. Network Learning and Testing 

All the atmospheres of the learning and the test 
bases are described by 30 atmospheric temperatures 
(4A levels up to 7 hPa for 32 Km height) and the 
corresponding 671 selected brightness temperatures 
computed by 4A. The neural network structure used 
for the regression is then 671-50-30: 671 units in 
the input layer (the 671 selected channels in the 
15.5 pm — 12.5 pm and the 4.7 pm — 4.0 pm spectral 
regions), 50 units in the hidden layer and 30 units in 
the output layer (the 30 lower atmospheric tempera- 
tures in 4A-levels, the interpolation to 32 lKm-level 
being made a posteriori). 

We have tested four configurations: for the “All- 
air-masses” and the “Tfopical-air-mass” , with and 
without the 4-pixels averaging (noise divided by 2, 
see section 2.2). 


6.2.1. “All-air-masses” configuration. We 
have merged the TIGR1 and TIGR3 data bases of 
section 4.1 and the resulting 3155 atmospheres have 
been randomly subdivided into a learning base of 2700 
atmospheres and a test base of 455 atmospheres. 

The RMS fit errors (given for the 32 atmospheric 
lKm-layers) for the learning and the test sets are 
shown in Figure 9-A for the 1-pixel configuration and 
Figure 9-B for the 4-pixels configuration. We have an 
overall good agreement between the computed and 
observed temperature profiles: rms errors close to 
1 K on average (less than lower than 1.3 K except 
near 10 hPa). Also, we can see that we are facing 
some problems in two vertical regions: 

• In the upper layers of the atmosphere: IASI pro- 
vides poor information above 20 hPa (see Fig- 
ure 5) due to the fact that the Jacobians of the 
channels sounding these layers are more verti- 
cally extensive than channels near surface and 
their amplitudes are smaller. So, the compen- 
sation phenomenon is more important in this 
vertical region. Some of our experiments have 
shown that the addition of the AMSU/A (also 
planned for flight on board METOP-1) infor- 
mation improves results in this vertical region. 

• In the near-surface layers: the difference T i0 ^ 
Ts complicate the retrieval due in part to the 
compensation phenomenon (a under-estimation 
of temperature in one layer is compensated by 
an over-estimation in a near-by layer). Consid- 
erations about specific neural networks compen- 
sation phenomenon are given in [Aires et al., 
1999; Aires, 1999], It is possible that the si- 
multaneous retrieval of Ts and T40, being more 
constrained, may solve this problem. 

Thus, even though the TIGR database possesses 
atmospheric situations with highly variable tempera- 
ture profiles, the RMS errors obtained in Figures 9-A 
and 9-B are close to the IASI objective (1 K of RMS 
error for 1 Km in vertical resolution). 

The use of 4-pixel averages uniformly decreases (by 
about 0.1 K) the RMS in the atmospheric layers. This 
relatively small improvement is due to the fact that 
the solution regularization used to avoid noise effects 
by the input perturbation method is sufficiently ef- 
ficient, so the reduction of noise by pixel- averaging 
has a reduced impact on the quality of the retrievals. 
This fact means that our method is able to provide 
good results for each pixel to maximize the horizon- 
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tal resolution or to perform scene selection. Five ran- 
domly choosen examples of retrievals in the test set 
are shown in Figure 10. 

6.2.2. “Tropical-air-mass” configuration. We 
have merged the Tropical-TIGRl and the Tropical- 
TIGR3 data bases of section 4.1 and the resulting 
1070 atmospheres have been randomly subdivided 
into a learning base of 1000 atmospheres and a test 
base of 70 atmospheres. 

The RMS errors (given for the 32 atmospheric 
IKm-layers) in the learning and the test set are given 
in the Figure 9-C for the 1-pixel configuration and in 
the Figure 9-D for the 4-pixels configuration. We see 
that the RMS profile is significantly improved at 1 K, 
so the specialization of the neural network to the trop- 
ical air-mass is important. As above, the RMS is also 
decreased by about 0.1 K with the 4-pixels average 
configuration. 

It i.s important to note that the specialization of 
the neural network on one air mass: 

• improves the retrievals; 

• requires a training data base with a larger num- 
ber of atmospheres. 

In this case, the 1070 tropical atmospheres are not 
sufficient, so differences between the learning and the 
test bases are not negligible. Future work should ad- 
dress this very important problem of the full repre- 
sentativity of the learning and testing bases. 

7. Conclusion and Perspectives 

A neural network approach uses a maximum of a 
priori information to limit the number of free param- 
eters in the neural model so as to constrain the re- 
trieval of surface and atmospheric temperatures as a 
“better-posed” problem. The method is trained us- 
ing the TIGR data base, i.e. a vast and complex 
set of atmospheric situations (from radiosonde mea- 
surements which are much more irregular than model 
output) with a wide range of radiosonde conditions 
including rare events. This fact is important to judge 
the quality of our results. The surface temperature 
for tropical situations displays a RMS error of 0.4 K 
for instantaneous retrievals). Results for atmospheric 
temperature profile retrievals are given for four con- 
figurations (“All-air-masses” or “Tropical-air-mass , 
with and without the 4-pixels average). Results are 
close to the specifications of the WMO for the “All- 
air-masses” configurations: 1 K of error for the instan- 


taneous temperature retrieval with 1 Km vertical res- 
olution. The specialization to the “Tropical-air-mass” 
significantly improves the results, which means that 
using a specialized neural network for a few different 
air-masses is the good strategy, but a larger dataset is 
then required to trained these specialized models. It 
is important to note that the results obtained for the 
IASI retrievals and entirely depend on the complexity 
of the dataset used to perform the statistics. Thus, 
it has been demonstrated in this work the potential 
of the IASI instrument to achieve the WMO specifi- 
cations for realistic conditions even for the complex 
situations included here. This new instrument is a 
clear advance over current instruments. The MLP in- 
version technique developped here for the processing 
of IASI observations is flexible enough to introduce a 
priori information in the retrieval scheme, is robust 
to noise, and is accurate and very fast. 

We plan to use independently a neural network for 
the two other air-masses (temperate and polar) by 
increasing the TIGR data base. Another idea is to use 
this methodology with more channels so as to retrieve 
not only the surface temperature and the temperature 
profile, but also water vapor and ozone profiles. The 
simultaneous retrieval of these variables is expected to 
exploit the correlations between variables in order to 
better constrain the inversion process. Considerable 
improvements are expected by the use in parallel of 
AMSU/A observations. Finally, further improvement 
may also be expected by the introduction of a first- 
guess solution in the MLP inversion [cf. Aires et al , 
2000 ]. 
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Table 1. 


z/ 

in cm~ 

NEAT 
1 in K 

V 

in cm“ 

NEAT 
1 in K 

650 

0.28 

1660 

0.34 

770 

0.28 

2090 

0.5 

790 

0.34 

2100 

0.5 

980 

0.34 

2420 

0.5 

1000 

0.28 

2430 

0.6 

1070 

0.28 

2500 

0.77 

1080 

0.34 

2600 

1.1 

1200 

0.34 

2700 

1.58 

1210 

0.28 

2760 

1.97 

1650 

0.28 





Table 2. The 40 pressure levels of the 4 A algorithm 


Level Pressure 
in hPa 


Altitude 
in Km 


Level 


Pressure 
in hPa 


Altitude 
in Km 


1 

0.05 

68.4 

2 

0.09 

64.3 

3 

0.17 

59.9 

4 

0.30 

56.0 

5 

0.55 

51.8 

6 

1.00 

47.7 

7 

1.50 

44.9 

8 

2.23 

42.2 

9 

3.33 

39.4 

10 

4.98 

36.6 

11 

7.43 

33.9 

12 

11.11 

31.1 

13 

16.60 

28.3 

14 

24.79 

25.6 

15 

37.04 

22.8 

16 

45.73 

21.3 

17 

56.46 

19.9 

18 

69.71 

18.4 

19 

86.07 

17.0 

20 

106.27 

15.5 


21 

131.20 

14.1 

22 

161.99 

12.6 

23 

200.00 

11.1 

24 

222.65 

10.4 

25 

247.90 

9.7 

26 

275.95 

8.9 

27 

307.20 

8.2 

28 

341.99 

7.4 

29 

380.73 

6.7 

30 

423.85 

6.0 

31 

471.86 

5.2 

32 

525.00 

4.5 

33 

584.80 

3.7 

34 

651.04 

3.0 

35 

724.73 

2.3 

36 

800.00 

1.6 

37 

848.69 

1.2 

38 

900.33 

0.8 

39 

955.12 

0.4 

40 

1013.00 

0.0 


Figure 1. Standard Deviation of IASI instrument noise for different brightness temperature measurement V 

Figure 2. Mean IASI spectrum (left) and corresponding standard deviation of IASI instrumental noise (right) 

Figure 3. Architecture of a MLP neural network with L layers, with inputs X and outputs Y 

^Trwical A^Mass’’ en ^ e * ope ° f TIGR3 atmospheric temperature profiles for A “All- Air-Masses”, B 

2 Ah- M^ ’ Temperate 1 Alr ‘ Mass - D “Temperate 2 Air-Mass”, E “Polar 1 Air-Mass”, and F “Polar 

Figure 5. Mean (for TIGR3 atmospheres) temperature Jacobian in the 15.5 pm - 12.5 pm spectral region 
££ region™ 5 ' heriC temperatUre JaC0bian profile for IASI and ^ 6 channels in the 15.5 pm - 12.5 pm 

^.2750™")““” ° f SUrfa “ Mmpera,ure VerSU! wave ” un,ber ' in the «■» IASI 

2^ 8 „„ ,he 270 “ (ordered by maximum abs “ p i0 " altlt “ de) 

ST 9 '| RMS error profile for the atmospheric temperature retrieval in the learning set (continuou • line) and in 
generalization set (discontinuous line) : A for configuration “All-air-masses/1 pixel”, B for config iration “All- 
air-masses/4 pixels , C for configuration “Tropical-air-mass/ 1 pixel”, and D for configuration “lYopical-air-mass/4 

Figure 10. Five atmospheric temperature profile retrieval examples in the configuration “All-air-masses/ 1 pixel” 
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